NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

WebInject: Prompt Injection Attack to Web Agents

Wang, Xilong; Bloch, John; Shao, Zedian; Hu, Yuepeng; Zhou, Shuyan; Gong, Neil Zhenqiang (October 2025, Conference on Empirical Methods in Natural Language Processing (EMNLP),)

Free, publicly-accessible full text available October 5, 2026
PAL: Program-aided Language Models

Gao, Luyu; Madaan, Aman; Zhou, Shuyan; Alon, Uri; Liu, Pengfei; Yang, Yiming; Callan, Jamie; Neubig, Graham (July 2023, Proceedings of the 40th International Conference on Machine Learning)

Large language models (LLMs) have demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought", which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and others. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on GSM8K, surpassing PaLM which uses chain-of-thought by absolute 15% top-1.
more » « less
Full Text Available
Causal Reasoning of Entities and Events in Procedural Texts

https://doi.org/10.18653/v1/2023.findings-eacl.31

Zhang, Li; Xu, Hainiu; Yang, Yue; Zhou, Shuyan; You, Weiqiu; Arora, Manni; Callison-Burch, Chris (May 2023, Findings of the Association for Computational Linguistics: EACL 2023)

Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.
more » « less
Full Text Available
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

Wang, Zhiruo; Cuenca, Grace; Zhou, Shuyan; Xu, Frank F.; Neubig, Graham (January 2023, Findings of the Conference of the European Chapter of the Association for Computational Linguistics)

Full Text Available
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

https://doi.org/10.18653/v1/2022.acl-long.214

Zhou, Shuyan; Zhang, Li; Yang, Yue; Lyu, Qing; Yin, Pengcheng; Callison-Burch, Chris; Neubig, Graham (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Procedures are inherently hierarchical. To “make videos”, one may need to “purchase a camera”, which in turn may require one to “set a budget”. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., “purchase a camera”) in an article to other articles with similar goals (e.g., “how to choose a camera”), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.
more » « less
Full Text Available
Improving Robustness of Neural Machine Translation with Multi-task Learning

https://doi.org/10.18653/v1/W19-5368

Zhou, Shuyan; Zeng, Xiangkai; Zhou, Yingqi; Anastasopoulos, Antonios; Neubig, Graham (January 2019, Proceedings of the Fourth Conference on Machine Translation)

Full Text Available

Search for: All records